Affordable Fault Tolerance Through Adaptation

نویسندگان

  • Ilwoo Chang
  • Matti A. Hiltunen
  • Richard D. Schlichting
چکیده

Fault-tolerant programs are typically not only difficult to implement but also incur extra costs in terms of performance or resource consumption. Failures are typically relatively rare but the fault-tolerance overhead must be paid regardless if any failures occur during the program execution. This paper presents an approach that reduces the cost of fault-tolerance, namely, adaptations to a change in failure model. In particular, a program that assumes no failures (or only benign failures) is combined with a component that is responsible for detecting if failures occur and then switching to a fault-tolerant algorithm. Provided that the detection and adaptation mechanisms are not too expensive, this approach results in a program with smaller fault-tolerance overhead and thus a better performance than a traditional fault-tolerant program. Thus, the high cost of fault-tolerance is only paid when failures actually occur.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Delivering Affordable Fault-tolerance to Commodity Computer Systems

Delivering Affordable Fault-tolerance to Commodity Computer Systems by Shuguang Feng

متن کامل

THICA: A Fault E

The parallel processing power is now within our reach through the affordable cluster computing models. Fault tolerance along with the throughput improvement is the prime research challenge of cluster computing. A new task rearrangement of cluster nodes has been implemented to increase the degree of fault tolerance with the existing cluster computing model built on the top of MPI. The proposed m...

متن کامل

Fault Tolerance for Multiprocessor Systems Via Time Redundant Task Scheduling

Fault tolerance is often considered as a good additional feature for multiprocessor systems but nowadays it is becoming an essential attribute. Fault tolerance can be achieved by the use of dedicated customized hardware that may have the disadvantage of large cost. Another approach to fault tolerance is to exploit existing redundancy in multiprocessor systems via a task scheduling software stra...

متن کامل

The Chameleon Infrastructure for Adaptive, Software Implemented Fault Tolerance

This paper presents Chameleon, an adaptive software infrastructure for supporting different levels of availability requirements in a heterogeneous networked environment. Chameleon provides dependability through the use of ARMORs—Adaptive, Reconfigurable, and Mobile Objects for Reliability. Three broad classes of ARMORs are defined: Managers, Daemons, and Common ARMORs. Key concepts that support...

متن کامل

Improving the palbimm scheduling algorithm for fault tolerance in cloud computing

Cloud computing is the latest technology that involves distributed computation over the Internet. It meets the needs of users through sharing resources and using virtual technology. The workflow user applications refer to a set of tasks to be processed within the cloud environment. Scheduling algorithms have a lot to do with the efficiency of cloud computing environments through selection of su...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998